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Abstract 

We introduce a model of generalized Hebbian learning and retrieval 
in oscillatory neural networks modeling cortical areas such as hippocam- 
pus and olfactory cortex. Recent experiments have shown that synaptic 
plasticity depends on spike timing, especially on synapses from excitatory 
pyramidal cells, in hippocampus and in sensory and cerebellar cortex. 
Here we study how such plasticity can be used to form memories and 
input representations when the neural dynamics are oscillatory, as is com- 
mon in the brain (particularly in the hippocampus and olfactory cortex). 
Learning is assumed to occur in a phase of neural plasticity, in which the 
network is clamped to external teaching signals. By suitable manipula- 
tion of the nonlinearity of the neurons or of the oscillation frequencies 
during learning, the model can be made, in a retrieval phase, either to 
categorize new inputs or to map them, in a continuous fashion, onto the 
space spanned by the imprinted patterns. We identify the first of these 
possibilities with the function of olfactory cortex and the second with the 
observed response characteristics of place cells in hippocampus. We inves- 
tigate both kinds of networks analytically and by computer simulations, 
and we link the models with experimental findings, exploring, in particu- 
lar, how the spike timing dependence of the synaptic plasticity constrains 
the computational function of the network and vice versa. 
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1 Introduction 



It has long been known that the brain is a dynamical system in which non- 
static activities are common. In particular, oscillatory neural activity has been 
observed and is believed to play significant functional roles in, for example, the 
hippocampus and the olfactory cortex. The inputs to these areas can be oscilla- 
tory, and the intra-areal connections also make these systems prone to intrinsic 
oscillatory dynamics. Networks of interacting excitatory and inhibitory neurons 
(E-I networks) are ubiquitous in the brain, and oscillatory activity is not un- 
expected in such networks because of the intrinsically asymmetric character of 
the interactions between excitatory and inhibitory cells. Recent experimental 
findings further underscored the importance of dynamics by showing that long 
term changes in synaptic strengths depend on the relative timing of pre- and 
postsynaptic firing[l. 4 [31 0, pj. For instance, in neocortical and hippocampal 
pyramidal cells [0, 2 , |, , the synaptic strength increases (long-term 

potentiation (LTP)) or decreases (long-term depression (LTD)), depending on 
whether the presynaptic spike precedes or follows the postsynaptic one. This 
synaptic modification is largest for differences between pre- and postsynaptic 
spike times of order 10 ms. Since the scale of this relative timing is compara- 
ble to the period of neural oscillations, the oscillatory dynamics should affect 
the resulting synaptic modifications. In particular, the relative phases between 
the oscillating neurons ought to constrain the synaptic changes that can oc- 
cur. These synaptic strengths should, in turn, determine the nature of the 
network dynamics. This interplay seems likely to have significant functional 
consequences. 

While we have achieved some understanding of the computational power of 
oscillatory networks (see M and references therein), they are poorly-understood 
in comparison with networks that always converge to static states, such as 
feed-forward networks or recurrent networks with symmetric connections. In 
particular, while we know a lot about appropriate learning algorithms for asso- 
ciative memory in symmetrically-connected and feedforward networks || , there 
is little previous work on learning, in the context of the synaptic physiological 
findings mentioned above, in asymmetrically connected networks with oscilla- 
tory dynamics. In this paper, we introduce a model for spike-timing dependent 
learning in an oscillatory neural network and show how such a network can 
perform associative memory or input representation after learning. 

The experimental findings dictate the general form of our model. It is an 
E-I (excitatory-inhibitory) network, with asymmetric interactions between ex- 
citatory and inhibitory cells, that exhibits input-driven oscillatory activity. We 
describe the long-term synaptic changes induced by a pair of pre- and postsy- 
naptic spikes at times t pre and t post by a function, which we denote A(r), of 
the difference in spike times r = t p0 st ~ tpre- Hence, A(r) is positive or neg- 
ative for LTP or LTD for a particular r value. According to the experiments 

||, ||, ||, ||, H, 0], A(t) varies in different preparations. For instance, the 
synapses between hippocampal pyramidal cells have A(r) > when r > and 
A(t) < when r < We will consider a general A(t) in order to be able to ex- 
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plore the consequences of different forms of this function which may be relevant 
to different areas or conditions. We study analytically and by simulation how 
oscillatory activities influence synaptic changes and how these changes influence 
the network oscillations and their functions. In particular, we ask the following 
questions: (1) How can the system function as an associative memory or as a 
substrate for a map of an input space? (2) What constraints do these functions 
place on the form of A(t)1 (3) What constraints would particular experimental 
findings about A{t) impose on the function of networks like this one? 

In the next section we present the model E-I network and describe its dy- 
namics for arbitrary synaptic strengths, making use of a linearized analysis. 
Section 3 then applies the spike-timing-dependent synaptic dynamics to the fir- 
ing states evoked by oscillatory input. We obtain general expressions for the 
resulting learned synaptic strengths and use the linearized theory to study the 
response properties of the network. We show how the learning rates can be 
adjusted so that, after learning, the network responds strongly (resonantly) to 
inputs similar to those used to drive it during learning and weakly to unlearned 
inputs. In addition to this pattern tuning, the system exhibits tuning with 
respect to driving frequency: the response is weakened for driving frequencies 
different from that used during learning. 

We show further that, depending on the kind of nonlinearity in the neu- 
ronal input-output function, the model can perform two qualitatively different 
kinds of computations. One is associative memory, in which an input to the 
network is categorized by identifying it with the learned pattern most similar 
to it. (Olfactory cortex is believed to operate in something like this way.) The 
other is to make a representation of the input pattern as a continuous mapping 
onto the space spanned by the learned patterns. For this mode of operation, 
which we call "input representation" , it is not necessary for the network to have 
learned explicitly all patterns to which it should respond; it performs a kind of 
interpolation between a much smaller number of prototypes. (In the hippocam- 
pus, we identify these prototypes with place cell fields.) Section 4 presents the 
nonlinear analysis of the network for both these cases. In Section 5 we examine 
the consequences of various possible constraints on the signs and plasticities of 
the synapses. Despite the primitive character of the model we use, we believe 
our findings may have relevance to the dynamics of many cortical areas. In the 
final section we discuss our results in the context of other modeling and exper- 
imental findings, indicating some interesting directions, both experimental and 
theoretical, for future work. 

2 The model 

We base our model on one formulated recently by two of us |l(| to describe 
olfactory cortex. For completeness, we summarize its main features here. In the 
brain regions we model, hippocampus or olfactory cortex, pyramidal cells make 
long range connections both to other pyramidal cells and to inhibitory interneu- 
rons, while inhibitory interneurons generally only project locally (Fig. 0A). The 
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elementary module of the system is an E-I pair consisting of one excitatory 
and one inhibitory unit, mutually connected. Each such unit represents a local 
assembly of pyramidal cells or local interneurons sharing common, or at least 
highly correlated, input. (The number of neurons represented by the excitatory 
units is in general different from the number represented by the inhibitory units.) 
Such E-I pairs, without connections between them, form independent damped 
local oscillators. The connections between units in different pairs, which we 
term long range connections, are subject to learning or plasticity in this study. 
They couple the pairs and determine the normal modes of the coupled-oscillator 
system. The input to the system is an oscillating pattern Ii driving the exci- 
tatory units. It models the bulb input to the pyramidal cells in the olfactory 
cortex or the input from the enthorinal cortex (perforant path) and the dentate 
gyrus (mossy-fiber) to CA3 pyramidal cells. The system outputs are from the 
excitatory cells. 

The state variables, modeling the membrane potentials, are u = {u\, ... ,un} 
and v = {vi, . . . , kjv} respectively for the excitatory and inhibitory units. (We 
denote vectors by bold font.) The unit outputs, representing the probabilities 
of the cells firing (or instantaneous firing rates) are given by g u (ui), ... ,g u (uN) 
and g v (vi), . . . , Q v (vn), where g u and g v are sigmoidal activation functions that 
model the neuronal input-output relations. The equations of motion are 

in = -am - $g v (vj) + J^gujuj) + h, (1) 

3 

Vi = -aVi + ^guiu^+^W^guiuj). (2) 

where or 1 is the membrane time constant (for simplicity assumed the same 
for excitatory and inhibitory units), is the synaptic strength from excitatory 
unit j to excitatory unit z, Wfj is the synaptic strength from excitatory unit j to 
inhibitory unit i, /3® and 7° are the local inhibitory and excitatory connections 
within the E-I pair i, and Jj(t) is the net input from other parts of the brain. We 
omit inhibitory connections between pairs here, since the real anatomical long- 
range connections appear to come predominantly from excitatory cells. (The 
parameter 7 2 ° could be identified as W° and the term 7°g u (ui) absorbed into 
the following sum over j, but for later convenience, we have written this local 
term explicitly.) All these parameters are non-negative; the inhibitory character 
of the second term on the right-hand side of Eqn. (|]) is indicated by the minus 
sign preceding it. 

2.1 linearization 

The static part I of the input determines a fixed point (u, v), given by the 
solution of equations ii = 0, v = with 1 = 1. Linearizing the equations ([!]) and 
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Figure 1: A: The model elements. Input is to the excitatory units u, which also 
provide the network output. There are local excitatory-inhibitory connections 
(vertical solid lines) and nonlocal connections (indicated by dashed lines) be- 
tween the excitatory units (Jy) and from excitatory to inhibitory units (Wjy). 
B, C, D: the activation functions for model neurons. Class I and class II non- 
linearities are shown in B and C respectively. Crosses mark the equilibrium 
point (u,v) of the system (see Sect. 2.1) used in our numerical simulations. The 
slopes of all activation functions used in these calculations are taken to be 1 at 
the equilibrium point. E,F,G: an example of kernel shape A(t) and the real 
and imaginary part of its Fourier transform. 



(||) around the fixed point leads to 

M - PiVi + JijUj + Sli 



-aui 



-avi + jiUi + 
5 



(3) 



where Ui and Vi are now measured from their fixed point values, #1 = I — I , 
ft = 9' v (vi)ffl, 7 4 = g' u (ui)if, Wij = g' u (uj)W^ Jij = g' u (u 3 )J°j. Henceforth, for 
simplicity, we assume Pi = P, Ji =7, independent of i. 

Eliminating the Vi from (|3|), we have the second order differential equations 

[{dt + af + P 1 ]u = Mu+(d t +a)5I (4) 

where M = (d t + a)J - p\N , (5) 

or, equivalently, 

u + (2a - J)u + [a 2 - ad + p(j + W)]u = (d t + a)5I. (6) 

(We use sans serif to denote matrices.) 

Given a stable fixed point, an oscillatory drive <5I = 5I + +SI~ , where 5I + oc 
e~ lujt and = SI + * , will lead eventually to a sustained oscillatory response 
u = u + +u with the same frequency u>, with u + tx e~ luJt and u _ = u + *. Then 
from Eqn. (^), 

[(-ito + a) 2 + pj] u+ = Y, M ^ u t + ( a ~ iw ) 5I t ' ( ? ) 

3 

where 

M = {a - kj)J - p\N. (8) 

is now the M in equation (||) applied to the e~ luJt modes. The terms in the 
square bracket describe the local E-I pair contribution, while M gives an effective 
coupling between the oscillating E-l pairs. A zero M makes u proportional to 
SI with a constant phase shift, i.e., each individual damping oscillator is driven 
independently by a component of the external drive. Learning imprints patterns 
into M through the long range connections J and W. After learning, u depends 
on how 5I + decomposes into the eigenvectors of M. Thus the network can 
selectively amplify or distort SI in an imprinted-pattern-specific manner and 
thereby function as an associative memory or input representation. 

2.2 Nonlinear ity 

At large response amplitudes, nonlinearity in g u and g v significantly modifies 
the response. We will focus on the nonlinearity in g u only, since g v only affects 
the local synaptic input while g u also affects the long range input mediated by 
J and W. We categorize the nonlinearity into 2 general classes in terms of how 
g u deviates from linearity near the fixed point u: 

class I: g u {ui) ~ Ui — auf class II: g u (ui) ~ Ui + au^ — bu^ (9) 

where a, b > 0, and Ui is measured from the fixed point value Ui. Class I 
and II nonlinearity differ in whether the gain g' u decreases or increases (before 
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saturation) as one moves away from the equilibrium point, and will lead to 
qualitatively different behavior, as will be shown. We will not treat the more 
general case where g u (u) is not an odd function of u. However, to lowest order 
a quadratic term just acts to shift the equilibrium point, and a quartic one does 
not affect our results qualitatively. 



3 Learning, neural dynamics, and model behav- 
ior 

In our treatment, we distinguish a learning mode, in which the oscillating pat- 
terns are imprinted in the synaptic connections J and W, from a recall mode, 
in which connection strengths do not change. Of course, this distinction is 
somewhat artificial; real neural dynamics may not be separated so cleanly into 
such distinct modes. Nevertheless, cholinergic modulatory effects probably do 



weaken synapses during learning 12 , so there is an experimental basis for the 
distinction, and it is conceptually indispensable. 

In what follows we will consider learning of oscillation patterns of two kinds. 
In one, two local oscillators are either in phase with each other or 180° out 
of phase, i.e., we can write Ui(t) oc £jCOsa>t, where the fj are real numbers 
(either positive or negative) describing the amplitudes on the different sites. In 
the second kind of pattern, different local oscillators can have different phases: 
Ui(t) oc cos(ujt — (f>i) . We can describe both cases by writing Ui(t) — £ie~ luJt + 
c.c, taking the & real in the first case and complex = l^le 1 ^) in the second. 
Thus we will often call the first case "real patterns" and the second "complex 
patterns" . 



3.1 learning mode 

Let Cij be the synaptic strength from presynaptic unit j to postsynaptic unit 
i. Let Xj(t) and yi(t) represent the corresponding activities relative to some 
stationary levels at which no changes in synaptic strength occur. Then ex- 
changes during the learning interval [0, T] according to 

8C %3 {t) = (y t (t)A(t - t')x (t')) = 1 [ dt [ dt' yi (t)A(t - t') Xj (t'). (10) 

1 JO J-oo 

where 77 is the learning rate and T may be taken equal to the period of the 
oscillating input. The kernel A(t — t') is the measure of the strength of synaptic 
change at time delay r = t — t' . For example, conventional Hebbian learning, 
with A(t) oc 8{t) (used, e.g., in 0), gives SCij oc f Q T dtui(t)uj(t). Some 
experiments [|[ || suggest A(r) to be a nearly antisymmetric function of t, 
positive (LTP) for r > and negative (LTD) for r < (see Fig. 0EFG). 
However, for the moment we do not restrict its shape. 

Applying the learning rule to our connections and Wij, we use Eqn. ( [To| ) 
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with x = u, C = J or W, and y = u or v, respectively, giving: 



T 



Wij = / dt / dt't; i (t)A w (t-* / )«j(* / )- (11) 



We have absorbed the learning rates into the definition of the kernels Ajyy and 
added the conventional normalizing factor 1/N for convenience in doing the 
mean field calculations. 

As mentioned above, cholinergic modulation can affect the strengths of long- 
range connections in the brain; these are apparently almost ineffective during 
learning ]l2| |. The neural dynamics is then simplified in our model by turning 
off J and W (and thus M) in the learning phase. 

Consider the learning of a single input pattern, 51 = ^e" 1 "^* + c.c. We 
calculate separately the responses uf and vf~ to the positive- and negative- 
frequency parts of the input, add them together, and use the resulting Uj(t) and 
Vi(t) in Eqns. (|ll| ) to calculate Jy and Wij- For the positive-frequency response 
we obtain 

Ui ~ ( a -i^)2 +/ 3 7 =Xo(^e (12) 

4 = — ^- XoK)Ce-^ 4 . (13) 
a - xojfi 

The quantity Xo(w) is the output-to-input ratio for the network with J and 
W equal to zero. The responses u~ and v~ to the corresponding negative- 
frequency driving pattern 81" = ^^*e lw f'* are the complex conjugates of ( p"2| ) 
and (|l3|), respectively. 

Substituting these responses into Eqns. ( pi] ) yields connections 

J£ = lR c[ i, 7 (^)^f] 

»v; - >[^^n d4) 



Aj, w {u) = |xo(w)| 2 / dr^j, w (r)e-^ (15) 



where 



can be thought of as an effective learning rate at a frequency cj. The factor 
of the Fourier transform of the learning kernel carries the information about 
different efficacies of learning for different postsynaptic-presynaptic spike time 
differences, while the factor |xo(w)| 2 reflects the responsiveness of the uncoupled 
local oscillators (J = W = 0) in the learning phase. Note that lm A j t w(iS) = if 
Aj,w{t) is symmetric in r and that Re Ajj/y{ui) = if Aj t w(r) is antisymmetric. 



8 



We will sometimes denote the real and imaginary parts of Aj^w by A' JW and 
Aj W , respectively. 

The resulting effective coupling M between oscillators after learning, under 
positive- frequency external drive 8I + of frequency u > (in general uj ^ u^), is 

M tj = — jj — R e [M<»»m # ] - — Re I - _ ffg J (16) 

The dependence of the neural connections J and W and the oscillator cou- 
plings M on is just a natural generalization of the Hebb-Hopfield factor 
for (real) static patterns. This becomes particularly clear if we consider 
the special case when there is the following matching condition between the two 
kernels: 

Aj(Up) = 2 A w {up), u = Up. (17) 

OL T" U ^ 

Then the oscillator coupling simplifies into a familiar outer-product form for 
complex vectors £: 

Mt 3 = -2iupAj(u^^*/N, (18) 

To construct the corresponding matrices for multiple patterns (which we 
will always take to be random and independent) we simply sum (|lj) over input 
patterns, labeled by the index fi, as for the Hopficld model. We restrict attention 
to the case where the number P of stored patterns is negligible in comparison 
with N, the size of the network (though it may be 3> 1). So far, all our results 
apply for both real and complex patterns. 



3.2 Recall mode 

After learning, the connections are fixed and the response u+ to an input 
Sl + cx e~ lwt is described by Eqn. ([?]). To solve it, we need to know how the 
M matrix acts on input vectors. We consider uncorrelated patterns all learned 
at the same frequency (up independent of /x). Then it is easy to see that M is 
a projector onto the space spanned by the imprinted patterns. It has P eigen- 
vectors (the imprinted patterns) with the same nonzero eigenvalue and N — P 
with eigenvalue zero. These are standard properties of outer-product construc- 
tions for orthogonal vectors; we can treat our £ M as effectively orthogonal here 
because we are taking the components £f to be independent and N ^> P |TJ] . 

The nonvanishing eigenvalue of M, which we denote IL(u, Up), is simply 
computed as 

IL{u;Up) = {a - uo)Aj(up) : — — , (19) 

a - iw M 

for complex patterns, and 

U(u; up) = 2(a - iu)ReA 3 (up) - 2/3 7 Re [ Aw ^^ ] ( 20 ) 
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for real patterns. 

Thus, from Eqn. (0), the response u + to an input <5I + in the imprinted- 
pattern subspace is 

u+ =x(u J ;lu^)S1+, (21) 
with the linear response coefficient or susceptibility 

. . a — iu> . . 

x(^;^m) = 2 i a 2 o- tt? V- ( 22 ) 

To achieve a resonant response to an input at the imprinting frequency 
(u> = w„), the learning kernels should be adjusted so that both the real and 
imaginary parts of the denominator in w u) are close to zero, i.e., 



A = 2w fl a + Imn(w (J ;w„)-»0. (24) 



- = n- +/3 7 -^; -Ren(w^;^) -> 0, (23) 



For real patterns, ImII(w M ; w M ) = — 2u> /J .<4j, so A = 2u>^(a—A'j). Thus small 
A requires a positive A'j > 0, i.e., stronger positive-r LTP than negative-T LTD 
for excitatory-excitatory couplings (again, provided the typical values of r for 
which Aj(t) is sizeable are small compared to the oscillation period). ^From 
Eqn. m, 



a 2 +/3j-ujI~ 2Rc 



aAj(u}fj.) 



P"(Aw{u») 



(25) 



Thus, for a given uj^, the resonance condition enforces a constraint on a linear 
combination of A'j, A' w and A'^. However, we note that A'j does not appear 
anywhere; it is simply irrelevant to learning real patterns. 
For complex patterns, 

e = a 2 +P 1 -Lol~{aA J +u^A':,)+ ( 2 6) 
A = 2^a+(ai'J-^i' J )-^_(ai' l V+^iW) (27) 

(We have temporarily suppressed the w M -dependence of A'j w and A'j w to save 
space.) One can get some insight here by considering the time-shifted learning 
kernels Aj(t + and Aw(t — where 6> M = tan~ 1 (a/o; / _ 1 ). (For 

a ~ a;^ « 40 Hz, these shifts are around 3 ms.) In terms of the associated 
frequency-domain quantities, 

B JiW (u) = |xo(^)| 2 r &TA JM {T±8Juj^)e-^ T = A JfW {u)e ±ie » , (28) 
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we can write Eqns. (Eq) and (|27|) as 



e = a 2 +/3 7 -<- y/tf+LjlB'}^)- J-± 4K) (29) 

/a 2 + w 2 



07 



/a 2 + w 2 



Thus, the imaginary parts of the frequency-domain kernels Bj^/(y>^) shift the 
resonant frequency and the real parts control the damping. In particular, one 
needs at least one of B' JW to be positive to achieve good frequency tuning. 
Negative (positive) imaginary parts B" w increase (decrease) the resonant fre- 
quency. 

When the learning window widths in the kernels Aj t w(r) are much smaller 
than the oscillation period, the shifts by ±6*^/0^ do not affect the real parts of 
Bjiy strongly. However, for window shapes (see Fig. IE) that change rapidly 
from negative to positive around t = 0, the imaginary parts can be strongly 
suppressed, even for fairly small shifts. 

The explicit form of the resonant response can be seen by expanding the 
denominator of xi^j^fi) around u> = uj^: 



u 

x(w;w M = — — — r, (31) 

e- lA- - Up) 



where 



ZK),2 W , + 2i a+ an(w;wJ 



Thus x has a pole at 



(32) 



e-iA eZ' - AZ" - i(AZ' + eZ") , . 

w = w P + — ^— =o>„ + r^p > ( 33 ) 

and, as the driving frequency uj in the recall phase is varied, the system exhibits a 
resonant tuning, with a peak near and a linewidth equal to (AZ' + eZ")/\Z\ 2 . 

One has to check that the desired learning rates and kernels do not violate 
the condition that the response function ( p2| ) be causal, i.e., small perturbations 
decay in time. Analytically, the requirement is that all singularities of x(w,w M ) 
must lie in the lower half of the complex ui plane. Thus, in Eqn. (|33"|), we need 
AZ' + eZ" to be positive. 

For real patterns, the analysis is fairly simple. From Eqns. (^0|) (|24|) and 
©, we obtain Z = 2uj^ + 2i(a - A',) and A = 2u^(a - A'j). Thus, for 
A — * 0, Z — > 2^, and the stability condition is simply that A be positive, i.e., 

^/W < a - 

For complex patterns, we get Z = 2uj f2 +A'j+i(2a—A' J ). Requiring AZ'+eZ" 
to be positive then imposes constraints on the signs and relative magnitudes of 
e and A, depending on Z' and Z" . We omit the details. 
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Notice that, for both the real and complex cases, the stability analysis does 
not depend on the W-learning kernel Aw at all (except insofar as it affects e 
and A). This is because Z involves the derivative 8U(uj , ui ^) / Ouj , and in both 
cases the only w-dependence of II is in the factors (a — iw) in the first terms of 
Eqns. ( |Tgj ) and (p(j|), which do not involve Aw- 

Fig.^f shows examples of the frequency tuning described by Eqn. ([5l]), as ob- 
tained from simulations of small networks, including the nonlinearities described 
in Sect. 2.2. Nonlinearity makes the response deviate from the linear predic- 
tion when the amplitude is larger, as happens near the resonance frequency. 
In particular, class I and class II nonlinearities lead to reduced and enhanced 
responses, respectively relative to the linear prediction, as will be analyzed in 
detail later. 



3.2.1 Examples of plasticity 

We use examples to illustrate the constraints that resonance and stability con- 
ditions place on the shape of the kernels for complex patterns. 



J only 

If the patterns are imprinted only in the excitatory-excitatory connections, we 
have only the first terms on the right-hand sides of Eqns. (|l9|) and (pp|). 

For real patterns A = 2u> tl (a — A'j) (no different from the general case), so, 
using A — * in Eqn. ( |25| ) yields e = — a 2 — cjjj. This then (from e = 0) fixes 

the imprinting frequency ui^ — y/ '/3j — a 2 . 

For complex patterns, we have A = 2aj M a— \J & 2 + uj 2 B'j, i.e., similar to the 
real-pattern case but with a shifted learning window and a different effective 
strength. The resonant frequency shifts down or up according to the sign of B'j. 



Same-sign plasticities in J and W 

Let us consider the case where the kernels are related by the matching condition 
(|l7|). While the exact match is clearly a special case, the simplification it yields 
in the algebra permits some insight which can be expected to carry over qualita- 
tively to other cases where the two kernels have similar shapes and comparable 
magnitudes. Here we find, from Eqns. ( |l9|) and (|20|), 

n(^;w M ) = -2iw M ij(w M ), (34) 

for both real and complex patterns. Applying the resonance condition equations 
2l p|), we have 

— w?, + a 2 + /?7 — e 
A'K^) - " 2clV (35) 

A'jifJp) = a-A/2^ (36) 
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Figure 2: Frequency tuning, shown as response to SV 
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A: temporal activities of 3 of the 10 excitatory cells. g u and g v are as in Fig. 



pJBD. B: frequency tuning curve. Response amplitude |(u + |£ M e 



(simply 



in the linearized theory) to input <5I + = ^e -1 "*. B.l: using matched 
kernel, from linearized theory (solid line) and from class I (stars) and II (circles) 
nonlinearities. B.2: Opposite-plasticities case (Aj = —Aw = 0.99(a — iw M )), 
complex patterns. Dashed line, circles, and triangles are results from the lin- 
earized theory and the class II nonlinear model with oj^ — 41Hz and oj^ — 68Hz 
respectively. 
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Thus Aj(a> M ) reduces the effective damping from a to A/2w M <C a, and this 
requires A'{ujfj) m a > 0. When the width of the learning kernel Aj(t) is 
much smaller than the oscillation period, A'^uj^) m J Aj(t)&t, thus a positive 
A!j{u>fj) requires that LTP dominate LTD in total strength. 

We observe that a negative i.e., an Aj(t) like that in Fig. |ljEFG, 

forces ui^ to be greater than \J a 2 + ]fy and thus greater than the intrinsic E-I 
pair frequency \fJFj (a shift in the opposite direction from that in the J-only, 
real-pattern case). 

In general, when the width of Aj(t) is not small, the resonance frequency 
has to be determined from equations ( p5|) and fl36|) by Ajfo^/Ajfap) « (~uj 2 + 
a 2 + /3 7 )/2auv 



Opposite-sign plasticities in J and W 

We turn now to the case where Aj{t) and A\y(r) have opposite signs (for all r). 
Again, we turn to a particular matching of the magnitudes of the two kernels 
to find a simple case that can give some general qualitative insight. We use our 
old matching condition, Eqn. ( |l7| ) , but with a minus sign. For complex patterns 
we now find Il(u) u , u> u ) = 2aAj{uj^) 1 and, applying the resonance condition 
equations (2J), 

-w„ + a 2 + /?7 - e 
A'jM = * 2a (37) 

-I'j^) = ^-A/2a (38) 

Comparing these with equations ( p5| ) and (^) and the accompanying analysis, 
we see that the roles of the real and imaginary parts of Aj have been reversed: 
Now it is the imaginary part that is constrained to be near a fixed value (— u)^) 
by the A — > condition, and the real part that enters in the e equation. We 
note that we need A'^ui^) < (i.e. like the case shown in Fig. |l|E), in order 
to obtain a small A, and that the sign of A'(w M ) determines whether the the 
resonance frequency is larger or smaller than \J o? + [3"f. 

Another interesting special case for complex patterns is when Aw = —Aj, 
with the particular choice 

Aj(ujfj,) = a - iw M (39) 
This leads to the remarkably simple result 

x(c^ M ) = . (40) 

w - w M 

That is, the choice ( |39| ) satisfies both constraints, e, A — > and, in addition, 
puts the resonance right at the original driving frequency. Fig. ^.B.2 shows a 
frequency tuning curve obtained with Aj(cj m ) = 0.99(a — iw„). 

To understand the prescription Aj(lu^) = a — ilu^, consider an oscillation 
period much greater than the temporal width of the learning kernel. Then 
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A'j(ojfj,) ~ J A(t)cIt and A"{uj^) rs — J Aj^ui^rdr oc — cj^. Thus the prescrip- 
tion just requires J Aj{t)<It ss a > (LTP dominates LTD in total strength) 
and J Aj(t)t(1t rs 1 > (LTP when postsynaptic spikes follow presynaptic 
ones, and LTD for the opposite order). This means that Aj(t) should look like 
Fig. and Aw{t) like its negative. 

We remark that for real patterns this choice of kernels does not produce 
resonant oscillations; in fact, it leads to instability. 



3.2.2 Pattern selectivity 

We now consider an input <5I + = £c~ lwt that does not match the imprinted pat- 
tern In general, we can decompose it into a component along ^ and a 
component in the complementary subspace: 8I + = Sit + fil±, with 8lt = 

(flOfe- 1 "* = ^~ 1 (Eifj%')f e ~ iWt - then ' MI+ = n ( w 5W M )lJ", and 

u+=x(w;w |i )ilJ-+xo(w)«t (41) 

The first term will be resonant at u> — u^, but the second will not. Thus, the 
system amplifies the component of the input along the stored pattern relative to 
the orthogonal one, as shown in Fig. ||. Again, nonlinearity makes the response 
deviate from the linear prediction at high response amplitudes, reducing and 
enhancing the responses for the class I and II nonlinearities, respectively. Class 
II nonlinearity also leads to hysteresis, with sustained responses even after the 
input is withdrawn, i.e., 1(^101 — * 0- The pattern selectivity can be measured 
by the ratio 



|XoK)| VA 2 + e 2 1 ' 



where we used the resonance condition ( [2 3D . When the input frequency cu devi- 
ates from Up, the pattern selectivity ratio |x(w; w M )|/|xo(^)| is reduced. 



3.2.3 Interpolation and categorization. 

With multiple imprinted patterns, an input SI + = £e -lwt which overlaps with 
several of them will evoke a correspondingly mixed resonant linear response 
u+ = xi^to^Slj + X o(co)SI+, where <5I+ = E„(^\0^- iut . That is, any 
input in the pattern subspace produces a resonant linear response just like that 
to an input proportional to a single pattern. This is a standard property of 
linear associative memories for orthogonal patterns. (We remind the reader that 
when the number of patterns is much smaller than the number of units in the 
network, independent random patterns may be taken as effectively orthogonal.) 
This feature enables the system to interpolate between imprinted patterns, i.e., 
to perform an elementary form of generalization from the learned set of patterns. 
This property can be useful for input representation. 
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Figure 3: Pattern selectivity: A: Response evoked on 3 of the 10 neurons of 
the network by input patterns a, b, and c matching the imprinted pattern a in 
frequency. The class I activation functions shown in Fig.|l]BD have been used. 
B: Response amplitude |(u + |£' 1 e~ lu ''»*)| vs. input overlap under input 

5I + = £e~ luJfit . Results from linearized theory (solid line) and from models with 
class I (stars) and II (circles) nonlinearities. C: Hysteresis effects in class II 
simulations. The response amplitude depends on the history of the system: the 
output remains at the resonance level after input withdrawl (circles connected by 
dotted line) . Circles connected by solid line correspond to the case of random or 
zero-overlap initial conditions. The connecting lines are drawn for clarity only. 
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A similar property also holds in the class I nonlinear model, but not in the 
class II model. To see this, suppose the drive I = ^ e ~ iuJ>it + c.c. overlaps two 
imprinted patterns, with £ oc cos ip + £ 2 smip, and write the response u + as 
u + oc cos</) + £ 2 sin0. For a linear model, </> = ip. The class I nonlinear model 
gives 45° > (f> > i> when ip < 45° and 45° < (f> < ip when ip > 45° (see Fig. Q). 
Thus, it tends to equalize the response amplitudes to £ 1 and £ 2 even when they 
contribute unequally to the input. In contrast, the class II nonlinear model 
amplifies the difference in input strengths to give higher gain to the stronger 
input component, £ x or £ 2 , thus performing a kind of categorization of the input. 
Thus, the two nonlinearity classes leads to different computational properties. 
For the case shown in Fig. ||B the parameters are such that the categorization is 
into three categories, corresponding to outputs near £ 2 , and their symmetric 
combination. For stronger nonlinearity, bipartite classification is possible. 

Another way to prevent undesirable interpolation between imprinted pat- 
terns (or classes of them) is to store different patterns or classes at frequen- 
cies that differ by more than the frequency tuning width. Suppose £ 1 e _1 " lt 
and £ 2 e ~ 1L ° 2t are imprinted, with w\ ^ UJ2 and £ x ■ £ 2 « 0. Then we have 
Jij = J}j + Jfj and Wij = Wlj + Wfj , where Jy and are given by equations 
( 53 ) with corresponding frequencies for fj, = 1,2. The resonance and stability 
conditions should be enforced separately for each pattern. 

After learning, an input I + = (£ 1 + £ 2 )e~ luJlt at frequency u)\ will evoke a 
response 

u+ pa x(wi,wi)^ + x(wi,w 2 )^ ra xCwi.WiJf 1 (43) 

since x(u>i, u>%) -C xi^ii w i) by design when \oj\ —uj2\ <C A. Hence, as illustrated 
in Fig. [|C, the system filters out the oscillation patterns learned at a different 
oscillation frequency from the input frequency. 

4 Nonlinear analysis 

Nonlinearity affects the response mainly at large amplitudes, which occur during 
resonant recall but not (we assume) in learning mode. Hence, in the following 
analysis, we leave the formulae for J, W, and M unchanged, ignore nonlinearity 
in response components orthogonal to the pattern subspace, and examine the 
corrections to the linear response u = x(w; w M )(5I|| . We take the input to be 
along the imprinted pattern STu — I^e~ luJt . We focus on the nonlinearity 
in g u , since g' v only affects the local synaptic input, while g' u also affects the 
long-range input. Equation (^) then becomes 

[(a - iw) 2 ]u = Mg u (u) + (a - iw)Sl. (44) 

where for simplicity, we include 7 as a diagonal clement of W, and by <? M (u) 
we mean a vector with components [<? M (u)]j = g u {ui). Making the ansatz u = 
q^e~ lu>t + c.c.+ higher order harmonics, we have, 

k, 3q 2 q*tf 2 ^*e~ iult + c.c. + higher order harmonics, (45) 
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Figure 4: A, B: Input-output relationship when two orthogonal patterns, and 
£ 2 , have been imprinted at the same frequency lo^ — AlHz. Input £ oc cos?/> + 
£ 2 sin ip and response u oc f 1 cos </>+£ 2 sm 4>- Circles show the simulation results, 
dotted lines show the analytical prediction for the linearized model. A: Class 
I. B: Class II. C: Categorization using different imprinting frequencies. Plotted 
are responses of 3 of the 10 excitatory units to various input patterns and 
frequencies. Patterns ^e -1 " 1 * and £ 2 e -1 " 2 *, where £ 1 _L £ 2 , lo\ = 41 Hz and 
lo 2 = 63 Hz, have been imprinted. Matched kernels are used with Aj{uji) = 
0.5 — 0.025i and Aj(uj2] = 0.5 — 0.43i satisfying resonance conditions. For the 
mixed input (fourth column), a = l/\/l7 and b = 4/v / 17- 



and analogously for Uj. The quantity q is the response amplitude of interest; in 
the linearized theory q — ► x(u;; 
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For the two nonlinearity classes ([|) we have, respectively, 
Mg(u) 



Mg(u) 

Thus, at a given response strength \q\, the imprinting strengths are effectively 
multiplied by the factors in parentheses. Consequently, class I nonlinearity 
reduces the response at large amplitude, whereas class II nonlinearity enhances 
it as long as the quadratic term in \q\ is larger than the quartic one. 

A consequence for class II is the fact that a system which is very close to 
resonance (e, A — > 0) in the linear regime can become unstable at higher response 
levels. The system will then jump to a new state in which the (negative) quartic 
term in @ is large enough that stability is restored, as seen in Figs. I and | 

Substituting (Bi) and (E^) into Q) and matching the coefficients of £t*e~ luJt 
on left and right sides, we obtain, for the two nonlinearity classes, respectively, 

X~ V; w„)g + ZaB = I, (48) 

3 



l-3a|#£|£jfj Mu, (46) 
l+3aM 2 £|^| 4 -5%| 4 £|^| 6 | Mu. (47) 



X -V;^)<? - 3aB£ |#| 4 M 2 <z + 56B£|#| 6 |«|*g = /. (49) 

3 3 

where B = II(cj; cj (!1 )/(q — iw). These equations can be solved for q. It is 
apparent that in general, both the phase and the amplitude of q are modified 
by the nonlinearity. 



5 Effects of synaptic weight constraints 

Because of the excitatory character of the pre-synaptic unit, Jy and Wij connec- 
tions have to be non-negative, a condition not respected by our learning formula 
( pT| ) so far. As a remedy, one may (1) add an initial background weight, J/N 
or W/N, independent of i and j, to each connection to make it positive, i.e., 



Jjj 



J/JV + E M 2Re[Ar^ef]/JV>0 



W/N + £ 2 7 Re 



/N > 0. 



(50) 



and/or (2) delete all net negative weights. 

It is clear from equations ( |50| ) that adding a background weight is like learn- 
ing an extra pattern £° that is uniform and synchronous, with = 1 for all 
i, with learning kernels Aj (u>o) and A^(u>o) which satisfy 2ReAj (wq) = 1 
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and 27Re [A ( w ' (ujq) / a — iuo] — 1. We assume that these kernels are the same 
as those with which the patterns £f are imprinted, up to an overall learning 
strength factor: A^ w (uj) = r}Aj t y^{oj), and that the imprinting frequencies are 
the same: w M = ujq. Thus if the £j are of unit magnitude, in order to guarantee 
that no Jij or Wij are to be negative we need rj > 1. 

This strategy can be effective provided that the imprinting of the uniform 
extra pattern does not lead to violation of any stability condition. Since we 
have assumed the imprinted patterns £ p (/i > 0) are (roughly) orthogonal to £°, 
we can treat the extra pattern independently of the others, and we just have 
to satisfy the same stability conditions for it that we previously found for the 
imprinted patterns. That is, the singularities of x(lu] lu^) have to lie in the lower 
half of the uo plane, where now xi^'^n) (Eqn. E2I) has to be computed from a 
Up,) which is a factor rj larger than before. For rj — > 1, we get no change in 
the stability conditions. 

Nonnegativity can be more practically achieved by simply deleting the net 
negative weights. For random patterns, and without background weights J and 
W, this leads to deleting half of the weights Jy and Wij obtained from the 
learning rule, which weakens their effect, quantified by the function n(w;w A1 ), 
by a factor of 2. In simulations we have found that increasing the learning 
strength by this factor leads to results like those found earlier when negative 
Jij and Wij were permitted. 

Finally, we remark that negative weights can also be simply implemented 
via inhibitory interneurons with very short membrane time constants. 

6 Summary and Discussion 
6.1 Summary 

We have presented a model of learning and retrieval for associative memory or 
input representation in recurrent neural networks that exhibit input-driven os- 
cillatory activities. The model structure is an abstraction of the hippocampus 
or the olfactory cortex. The learning rule is based on the synaptic plasticity 
observed experimentally, in particular, long-term potentiation and long-term 
depression of the synaptic efficacies depending on the relative timing of the 
pre- and postsynaptic spikes during learning. After learning, the model's re- 
trieval is characterized by its selective strong responses to inputs that resemble 
the learned patterns or their generalizations. Our work generalizes the outer- 
product Hebbian learning rule in the Hopfield model to network states charac- 
terized by complex state variables, representing both amplitudes and phases. 
Our work differs from previous modeling in the following respects: (1) We al- 
low that stored patterns vary in both amplitudes and phases, as well oscillation 
frequency. (2) We imprint input patterns into the synapses using a generalized 
Hebbian rule that gives LTP or LTD according to the relative timing of pre- and 
postsynaptic activity. (3) We explore two qualitatively different functions of the 
network: one (associative memory) is to classify inputs into distinct categories 



20 



corresponding to the individual learned examples, and the other is to represent 
inputs as interpolations between or generalizations of learned examples. 

The same model structure was used previously, with a conventional Hebbian 
rule with Aj^(t) oc S(t), by two of the authors in a model for odor recogni- 
tion/classification and segmentation in the olfactory cortex p"o||. The principal 
new contributions in the current work are (1) linking the model with the recent 
experimental data on neural plasticity and LTP /LTD and dissecting the role of 
the functional form of the learning kernel Ajyv(r) in determining the selectivity 
to input patterns and frequencies, (2) an extended analysis of input selectivity 
and tuning, (3) exploration of the two different computational functions (as- 
sociative memory and input representation) of the model, and (4) a detailed 
analysis of nonlinearity in the model. 



6.2 Discussion 

By using both amplitude and phase to code information, it is possible either to 
encode additional information or to increase robustness by redundantly coding 
the same information coded by the amplitudes. Indeed, hippocampal place 
cells, which code the spatial location of the animal, fire at different phases of 
the theta wave depending on the location of the animal in the place fields |Q. 
In this case, the information encoding is redundant since the location is in 
principle already encoded by the firing rates (i.e., oscillation amplitudes) in the 
neural population. In our model, combined phase and amplitude coding requires 
matching both the amplitude and phase patterns of the inputs with the learned 
inputs under recall, making matching more specific. This scheme necessitates 
learning both excitatory-to-excitatory connections and excitatory-to-inhibitory 
ones. Thus, in a system of N coupled oscillators, the stored items are coded 
by 2N variables — N amplitudes and N phases, requiring the specification 
of 2N 2 synaptic strengths — N 2 excitatory-to-excitatory synapses and another 
N 2 excitatory-to-inhibitory ones. Omitting phase coding would require learning 
of only N 2 synapses, e.g., of the excitatory-to-excitatory connections, as in 
previous models f0 



1G 



Our model's frequency selectivity adds additional matching specificity during 
recall. Furthermore, frequency matching can modulate the spiking timing reli- 
ability, since higher or lower oscillation amplitudes, caused by better or worse 
frequency matching, should make the firing probabilities of the cells more or 
less modulated or locked by oscillation phases. Frequency dependence of spike 
timing reliability has been observed in cortical pyramidal cells and interneu- 
rons |L7[ . In our model, the frequency tuning is a network property imprinted in 
long-range connections, although frequency tuning as a resonance phenomenon 
could in principle exist in a single neural oscillator or a local circuit. 

In our model, both excitatory-to-excitatory and excitatory-to-inhibitory syn- 
apses are modifiable. Experimentally, there is as yet little evidence concerning 
plasticity in pyramidal-to-interneuron synapses. More experimental investiga- 
tions are needed. In experiments by Bell [||, plasticity of the excitatory-to- 
inhibitory synapses between parallel fibers and medium ganglion cells in the 
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ccrcbcllum-like structure of the electric fish has been observed, although these 
synapses are not part of a recurrent oscillatory circuit. 

We explored the constraints on the learning kernel functions Aj^ir) im- 
posed by the requirement of a resonant response. A condition that came up in 
almost all the variants of the model that we explored was that A'j(u)^) should be 
positive in order to achieve a strong, narrow resonance. This means, roughly, 
that for excitatory-excitatory synapses LTP should dominate LTD in overall 
strength, for spike time differences smaller than 

Another condition we considered was that the resonant frequency should 
be the same as the driving frequency w p during learning. We saw that for 
real patterns and learning only of the excitatory-excitatory connections, this 
could not be satisfied for general uj^. However, with learning of the excitatory- 
to-inhibitory connections it could, for a suitable (negative) value of 
For complex patterns (see Eqn. |2^), the imaginary parts of both Bj and By^ 
contribute to the shift, so if they have opposite signs (of the correct relative 
magnitude) the condition can be satisfied, independent of lo^. These features 
should be looked for in investigations of plasticity of excitatory-to-inhibitory 
synapses. 

An interesting property we have identified in the model is its ability to 
subserve two different computational functions: to classify inputs into distinct 
learned categories, and to represent input patterns as interpolation and gener- 
alizations of the prototype examples learned. 

Categorization is appropriate for associative memories and has been applied 
in our previous model of olfactory cortexpO|. In this context, interpolation 
between different learned patterns is not desired; individually learned odors 
should have specific roles, ft is more desirable to perceive individual odors 
within an odor mixture than to perceive an unspecific blend. 

On the other hand, interpolation is advantageous in some circumstances. 
Consider an animal learning an internal representation of a region of space. 
If particular spatial locations are represented as particular imprinted patterns, 
then locations in between them will be represented as linear combinations of 
these patterns. Thus, the network is able to represent a continuum of positions 
in a natural way. Hippocampal place cells seem to employ such a representa- 
tion. A network that interpolates can generalize from the learned place fields 
to represent spatial locations between the learned place fields by superposition 
of the neural activities of the place cells. Because the place fields are localized, 
the generalization is conservative (and thus robust): It does not extend beyond 
the spatial range of the learned locations or to regions between distant, disjoint 
place fields. 

We showed that our network can serve one or the other of these two com- 
putational functions, depending on the nonlinearity in the neuronal activation 
functions. Class I leads to the interpolation or input representation operation 
mode, while class II leads to categorization. The form of g(u) could be subject to 
modulatory control, permitting the network to switch function when appropri- 
ate. The switch could even be accomplished, for a suitable form of g(u), simply 
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by a change in the DC input level, since it is possible to change the effective 
nature of the nonlinearity near the operating point by shifting the resting point. 
It seems likely to us that the brain may employ different kinds and degrees of 
nonlinearity in different areas or at different times to enhance the versatility of 
its computations. 

We have seen that it is possible to store different classes of patterns at differ- 
ent oscillation frequencies, and that the network does not interpolate between 
patterns stored at different frequencies. This feature gives the system the pos- 
sibility of performing several different forms of input representation or catego- 
rization without interference between them. For instance, all place fields could 
be stored at one frequency, while odor memories could be stored at another, 
and there would be no crosstalk between the two modalities if the frequencies 
differed by much more than the resonance linewidth. 

In conclusion, we have seen that this rather simple network is endowed with 
interesting computational properties which are consequences of the combination 
of its oscillatory dynamics and the spikc-timing-dependent synaptic modification 
rule. Although experiments to date have not clearly uncovered examples of 
networks in the brain that function in just this fashion, we hope that our findings 
here will stimulate further investigations, both theoretical and experimental. 
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